Mark D. Scheuerell
Northwest Fisheries Science Center, National Oceanic and Atmospheric Administration, Seattle, WA USA
Nick Golding
Quantitative and Applied Ecology Group, School of BioSciences, University of Melbourne, Melbourne, Victoria, AUS
DISCLAIMER
This vignette is still in the testing and evaluating phase and should not be considered complete or error-free.
This is version 0.18.01.10.
Dynamic Factor Analysis (DFA) is a dimension reduction technique specific to time series analysis. The general idea is to model \(N\) time series as a linear combination of \(M\) “hidden”, time-varying factors, where \(M \ll N\). For an \(N \times T\) matrix of data \(\mathbf{y}\), where \(\mathbf{y}_t\) is an \(N \times 1\) column vector, the DFA model is
\[ \mathbf{y}_t \sim \text{MVN}(\mathbf{Z} \mathbf{x}_t + \mathbf{a},\mathbf{R}) \] \[ \mathbf{x}_t \sim \text{MVN}(\mathbf{x}_{t-1},\mathbf{Q}) \] \[ \mathbf{x}_0 \sim \text{MVN}(\mathbf{0},\mathbf{Q}_0) \]
The \(N \times M\) matrix \(\mathbf{Z}\) maps the factors onto the observed data at time \(t\). The \(N \times 1\) vector \(\mathbf{a}\) contains offsets for each of the observed time series. The covariance matrix \(\mathbf{R}\) of the observation errors can be anything from simple IID errors (i.e., same variance parameter down the diagonal ans zeroes elsewhere), to something much more complex like an unconstrained block diagonal.
The factors are modeled as random walks where the covariance matrix \(\mathbf{Q}\) of the process errors governing their evolution is generally assumed to be an \(M \times M\) identity matrix, \(\mathbf{I}_M\). The covariance matrix \(\mathbf{Q}_0\) of the initial states \(\mathbf{x}_0\) is typically assumed to have large values along the diagonal and zeros elsewhere.
Estimating the parameters in a DFA model is not trivial. There are several available methods for doing so, including an EM algorithm as implemented in the MARSS package. However, the fitting process can be extremely slow when working with really large data sets (e.g., 50+ time series).
For example, a couple of years ago MDS helped a graduate student with a DFA analysis based on estimates of daily growth rates for juvenile Chinook salmon. She had 100+ time series from four different year/month combinations, each of which was about 40-180 days long. We fit our models with MARSS using a distributed cluster from Domino, but the models still took days to converge.
The greta package was designed by NG to do Hamiltonian Monte Carlo via Google’s TensorFlow computational engine, which makes it really fast on big datasets. It’s also nice because you write the models using normal R syntax.
For this vignette we’re using the development version of greta, which you can install from GitHub via
devtools::install_github('goldingn/greta@dev')
You will also need to install TensorFlow and the tensorflow package for R before greta will work. See the output below for the versions used here.
library(tensorflow)
library(greta)
installed.packages()[c("greta", "tensorflow"), "Version"]
## greta tensorflow
## "0.2.2" "1.4.1"
tensorflow::tf_version()
## [1] '1.4'
In addition, we also make use of the mvrnorm() function in the MASS package, color palettes in the viridis package, and the corrplot() function from the corrplot package.
library(MASS)
library(viridis)
library(corrplot)
Lastly, we need to define a function that returns the cumulative sum of the rows (or columns) of a matrix that we will use for estimating the latent factors \((\mathbf{x})\).
op <- greta::.internals$nodes$constructors$op
cumsum_mat <- function(x, a=1L) {
## cumsums across matrix rows/cols
## rows: a = 1L
## cols: a = 0L
stopifnot(length(dim(x)) == 2)
op("cumsum_mat",
x,
tf_operation = tf$cumsum,
operation_args = list(axis = a))
}
Our general approach is to create a large number of time series, each of which to a greater or lesser degree share some temporal patterns with one another. We’ll use 50 time series that are 30 units long, each of which is a linear combination of 5 different latent trends.
NN <- 50
TT <- 30
MM <- 5
In a DFA model, the rows in the matrix of latent factors \(\mathbf{x}\) are generally assumed to be independent random walks, each of which is a cumulative sum of a sequence of independent process errors.
(Note that after simulating the latent trends, we subtract the mean of each to avoid having to estimate the offsets in the vector \(\mathbf{a}\).)
set.seed(123)
## MM x TT matrix of innovations
ww <- matrix(rnorm(MM*TT, 0, 1), MM, TT)
## MM x TT matrix of latent trends
xx <- t(apply(ww,1,cumsum))
## convert to mean-zero
xx <- xx - apply(xx,1,mean)
The matrix \(\mathbf{Z}\) maps the factors \(\mathbf{x}\) onto the observations \(\mathbf{y}\). We want some of the time series to look more like one another than others, so we randomly draw the loadings from a uniform range of values within specific blocks. (Note that this bit of code only works if NN %% MM == 0.)
## NN x MM loadings matrix
ZZ <- matrix(NA, NN, MM)
LL <- seq(-1,1,length.out=MM+1)
for(i in 1:MM) {
zz <- NULL
for(j in 1:MM) {
zz <- c(zz,runif(NN/MM,LL[j],LL[j+1]))
}
ZZ[sample(NN,NN),i] <- round(zz,3)
}
Now we can use the loadings and some observation errors to create the observed time series. Here we assume that the errors are IID, and their variance is 0.2. We can ignore the additive effect of the offset vector \(\mathbf{a}\) because the expectations of \(\mathbf{x}\) and \(\mathbf{e}\) are both zero.
## obs errors
ee <- t(mvrnorm(TT, matrix(0,NN,1), diag(0.2,NN,NN)))
## NN x TT matrix of observed data
yy <- ZZ %*% xx + ee
It’s hard to tell from this plot how many of the 50 time series are correlated. Here is a plot of the correlation coefficients for all of the pairwise comparisons with them clustered by similarity.
Before we can estimate the model parameters we need to specify distributions for the priors and likelihoods.
To make the model identifiable when \(M\) > 1, the upper right section of \(\mathbf{Z}\) must be set equal to zero. For example, if \(M\) = 4 then
\[ \mathbf{Z} = \begin{bmatrix} z_{11} & 0 & 0 & 0 \\ z_{21} & z_{22} & 0 & 0 \\ z_{31} & z_{32} & z_{33} & 0 \\ z_{41} & z_{42} & z_{43} & z_{44} \\ z_{51} & z_{52} & z_{53} & z_{54} \\ \vdots & \vdots & \vdots & \vdots \\ z_{N1} & z_{N2} & z_{N3} & z_{N4} \end{bmatrix}. \]
Thus, we need to tell greta which of the parameters in \(\mathbf{Z}\) are to be estimated, and which should be fixed at 0. The order of operations is important here because greta does not allow you to reassign elements in an array to fixed values after they have been declared as random.
## loadings matrix
ZZ_est <- zeros(NN,MM)
idx <- lower.tri(ZZ_est, diag = TRUE)
ZZ_est_raw <- normal(0, 10, dim = sum(idx))
ZZ_est[idx] <- ZZ_est_raw
The reason we defined ZZ_est_raw separately here, rather than directly assigning values to ZZ_est is so that we can use the evaluate() function to verify its contents. With evaluate(), we can calculate the value of a greta array by passing a fixed value for those it depends on.
So, for example, if we specified \(\mathbf{Z}\) correctly, the following should return the top of ZZ_est with a 1 for those elements that are to be estimated and a 0 otherwise.
head(evaluate(ZZ_est, list(ZZ_est_raw = rep(1, sum(idx)))))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 0 0 0 0
## [2,] 1 1 0 0 0
## [3,] 1 1 1 0 0
## [4,] 1 1 1 1 0
## [5,] 1 1 1 1 1
## [6,] 1 1 1 1 1
It looks like everything worked as expected.
Defining the priors for the covariance matrix \(\mathbf{R}\) is straightforward. In this case, there is no covariance among the observation errors so we could model them as a set of univariate normals, but we will keep the code general enough to allow for possible covariance(s).
The maximum variance among all of the observed time series is about 12 so we’ll use a half-cauchy prior on the observation variance with a scale of 15 and truncate the upper end of the distribution at 15. (Note that we follow the greta convention of using the ‘=’ assignment for stochastic nodes.)
## diagonal of R
RR_est_raw = cauchy(0, 15, 1, truncation=c(0, 15))
The next step is to specify the likelihood functions.
For the factor model, we need to specify a model for the temporal evolution of the random walks. This is where we take advantage of the cumsum_mat() function we defined above. (Recall that we defined the covariance matrix \(\mathbf{Q}\) to be the identity matrix \(\mathbf{I}\).)
## inital factor
X0 = normal(0, 10, c(MM, 1))
## factors for t = 2-TT
XX = normal(0, 1, c(MM, TT))
## combine & cumsum over rows for the random walk
xx_est <- cumsum_mat(cbind(X0,XX))[,-1]
greta requires a specification for the mean and variance of the multivariate normal distribution. However, rather than draw time-specific samples we will instead 1) create \(NT \times 1\) vectors of the data \(\mathbf{y}\) and their expectations \(\mathbf{Z} \mathbf{x}\); and 2) create an \(NT \times NT\) covariance matrix based on \(T\) replicates of \(\mathbf{R}\) along the diagonal.
## vectorize data
yy_vec <- as_data(matrix(yy, 1, NN*TT))
## vectorize mean
Zx_vec <- c(ZZ_est %*% xx_est)
## create expanded cov matrix
RR_star <- zeros(NN*TT, NN*TT)
diag(RR_star) <- rep(RR_est_raw, NN*TT)
## define likelihood
distribution(yy_vec) = multivariate_normal(Zx_vec, RR_star)
Now we can define the complete model and do some MCMC sampling.
mod_fit <- model(xx_est, ZZ_est, RR_est_raw)
mcmc_smpl <- mcmc(mod_fit, n_samples = 2000, thin = 1, warmup = 1000,
chains = 1, verbose = FALSE)
It is important to determine whether or not the model has converged before we can place any confidence in the results. Among the many diagnostic measures available, the options are somwhat limited when there is only one chain.
One quick option is to visually inspect the MCMC traces with traceplot() from the coda package. Because the output is so voluminous, we only show them for the first 4 estimates.
par(mfrow=c(2,2), mai=c(0.8,0.5,0.2,0.2))
for(i in 1:4) {
coda::traceplot(mcmc_smpl[[1]][,i])
}
These chains clearly show evidence of nonconvergence, suggesting we need to use a longer burnin period.
Another more formal (and faster) option is to use the Heidelberger and Welch’s convergence diagnostic, which is also available in the coda package. In particular, we can ask what percent of the estimates pass the half-width test, which compares the means from beginning and ending sections of the chain.
## number of estimated params/states
n_par <- dim(mcmc_smpl[[1]])[2]
## H&W diag: pass (1) or fail (NA)
halfwidth_test <- coda::heidel.diag(mcmc_smpl[[1]])[,4]
## proportion passing
round(sum(halfwidth_test, na.rm = TRUE) / n_par, 2)
## [1] 0.31
Only about 31% of the estimates appear to have converged, but that isn’t too surprising given the relatively short burnin period and subsequent chain length.
(Note that from here we proceed as though the model had converged.)
We begin by summarizing the MCMC samples.
mod_smry <- summary(mcmc_smpl)
par_means <- mod_smry$statistics[,"Mean"]
Recall that we constrained \(\mathbf{Z}\) in such a way as to choose only one of many possible solutions, but fortunately they are equivalent and can be related to each other by a rotation matrix. Let \(\mathbf{H}\) be any \(m \times m\) non-singular matrix. The following are then equivalent DFA models:
\[ \begin{gathered} \mathbf{y}_t \sim \text{MVN}(\mathbf{Z} \mathbf{x}_t, \mathbf{R}) \\ \mathbf{x}_t \sim \text{MVN}(\mathbf{x}_{t-1},\mathbf{Q}) \end{gathered} \]
and
\[ \begin{gathered} \mathbf{y}_t \sim \text{MVN}(\mathbf{Z} \mathbf{H}^{-1} \mathbf{x}_t, \mathbf{R}) \\ \mathbf{H}\mathbf{x}_t \sim \text{MVN}(\mathbf{H}\mathbf{x}_{t-1},\mathbf{Q}) \end{gathered} \]
There are many ways of doing factor rotations, but a common method is the “varimax”" rotation, which seeks a rotation matrix \(\mathbf{H}\) that creates the largest difference between the loadings in \(\mathbf{Z}\).
The varimax rotation is easy to compute because R has a built in function for this: varimax(). Interestingly, the function returns the inverse of \(\mathbf{H}\), which we need anyway.
ZZ_fit <- par_means[grepl("ZZ",rownames(mod_smry$statistics))]
ZZ_fit <- matrix(ZZ_fit, NN, MM, byrow=FALSE)
## rotation matrix
HH_inv <- varimax(ZZ_fit)$rotmat
## rotated Z
ZZ_rot <- ZZ_fit %*% HH_inv
round(ZZ_rot, 2)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.59 1.06 -0.51 -0.34 -1.99
## [2,] -1.63 0.73 2.09 0.98 0.52
## [3,] 1.13 -1.60 0.00 -0.89 0.91
## [4,] -2.09 -0.09 0.86 0.52 0.75
## [5,] -0.49 0.95 -0.88 -0.37 2.16
## [6,] -0.02 0.83 0.22 -1.89 1.84
## [7,] 0.33 -1.50 1.81 -3.98 0.37
## [8,] -0.04 3.76 0.14 -0.13 0.19
## [9,] -1.78 0.55 -0.17 1.45 1.89
## [10,] 0.05 0.96 0.50 -0.50 0.67
## [11,] 2.31 0.51 0.86 0.44 -0.79
## [12,] 1.05 -1.12 1.24 -2.92 -1.14
## [13,] -2.50 0.48 0.40 1.36 0.95
## [14,] -1.30 2.18 -0.42 0.49 -0.85
## [15,] 0.25 -0.91 -2.29 0.05 0.48
## [16,] -0.21 -2.75 0.50 -0.41 0.41
## [17,] -0.49 0.78 0.31 2.22 -0.99
## [18,] 0.56 -0.37 3.22 -0.88 0.14
## [19,] -0.80 -0.76 -0.04 -2.18 -0.49
## [20,] -1.01 -4.33 -0.86 0.33 0.18
## [21,] -0.96 -0.22 -0.11 -0.28 1.81
## [22,] 0.85 3.61 -0.09 2.37 0.61
## [23,] 1.14 3.12 0.40 3.00 0.76
## [24,] -0.18 -0.90 1.17 0.42 -0.43
## [25,] -0.22 0.21 -0.49 0.31 -2.20
## [26,] -0.68 0.56 1.50 -0.21 2.34
## [27,] 2.70 0.18 -0.86 2.03 -1.44
## [28,] 0.29 0.65 0.63 -0.04 -1.41
## [29,] -0.26 -2.81 -0.85 -1.88 -1.24
## [30,] -0.38 -1.18 -0.64 -0.41 -3.27
## [31,] -0.36 -3.51 0.48 -1.03 -0.81
## [32,] 0.05 0.31 -1.54 0.88 -0.08
## [33,] 2.00 0.39 0.53 -0.04 0.66
## [34,] -0.03 -0.64 -2.80 1.84 -0.49
## [35,] 0.80 -0.50 -0.60 -2.51 0.02
## [36,] -0.42 0.19 1.26 -2.32 1.69
## [37,] -0.50 -2.18 0.19 0.93 -1.82
## [38,] 0.59 2.02 -2.00 1.41 0.33
## [39,] -0.90 0.55 0.55 0.28 2.91
## [40,] -0.97 -2.36 2.91 -2.16 2.02
## [41,] 2.74 0.91 -1.67 -0.29 -0.81
## [42,] 1.42 -0.37 1.17 -1.45 -2.32
## [43,] -0.92 0.37 -0.82 2.57 -0.72
## [44,] -1.02 0.10 -0.32 1.57 0.03
## [45,] 1.27 0.11 -2.95 0.53 -1.75
## [46,] -0.29 -1.02 -0.89 1.99 -0.16
## [47,] -0.43 -0.45 -0.91 -0.89 0.83
## [48,] 0.15 0.30 0.41 -2.16 -1.53
## [49,] 0.80 0.35 -0.46 -0.15 -0.32
## [50,] -1.20 1.21 0.10 0.74 0.24
Here are the 5 factors in \(\mathbf{x}\). Note that we need to rotate these just as we did \(\mathbf{Z}\). The top panel has the true factors; the bottom panel shows the estimated factors. Note that there is no way to insure that the specific ordering of the estimated factors will match the true factors.
# fitted factors
xx_fit <- par_means[grepl("xx",rownames(mod_smry$statistics))]
xx_fit <- matrix(xx_fit, MM, TT, byrow=FALSE)
## rotated factors
xx_rot <- solve(HH_inv) %*% xx_fit
It’s difficult to see which factors, if any, align with one another, so here is a graphical representation of their correlation.
par(mai=rep(0.1, 4), omi=rep(0.1, 4))
corrplot(cor(t(xx), t(xx_rot)), method="ellipse",
tl.col = "black", tl.srt = 0, tl.cex = 0.8, tl.offset = 0.7,
cl.cex = 0.8, cl.offset = 0.9, cl.ratio = 0.2)
Here is an estimated of the observation variance.
RR_fit <- par_means[grepl("RR",rownames(mod_smry$statistics))]
round(RR_fit, 2)
## RR_est_raw
## 0.21
The estimate is really close to the true value of 0.2.
Here is a graphical representation of the correlation between the observed and fitted data for the 50 time series. Note that their (row, col) ordering is arbitrary.
## fitted values
yy_fit <- ZZ_rot %*% xx_rot
## corrrelation
cor_yy <- matrix(diag(cor(t(yy), t(yy_fit))), NN/10, 10)
## plots
par(mai=rep(0.1, 4), omi=rep(0.1, 4))
corrplot(cor_yy, method="ellipse",
tl.pos = "n",
cl.cex = 0.8, cl.offset = 0.9,
cl.ratio = 0.2, cl.lim = c(0, 1))